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Abstract. Let the sample correlation matrix be W = YY^ , where Y = 



with yij = Xij / ^Yl^^i^lj- We assume {xij : 1 < i < p,l < j < n} to he 
a collection of independent symmetric distributed random variables with sub- 
exponential tails. Moreover, for any i, we assume Xij, 1 < j < n to be identically 
distributed. We assume < p < n and p/n y with some y £ (0, 1) as p, n — >■ oo. 
In this paper, we provide the Tracy- Widom law {TW\) for both the largest and 
smallest eigenvalues of W . If Xij are i.i.d. standard normal, we can derive the 
TWi for both the largest and smallest eigenvalues of the matrix TZ = RR^ , where 

R = with rij = {xij — Xi)/ — ^0^1 = 5I!j=i ^^i- 

1. Introduction 

Suppose we have a p-dimensional distribution with mean /i and covariance ma- 
trix S. In recent three or four decades, in many research areas, including signal 
processing, network security, image processing, genetics, stock marketing and other 
economic problems, people are interested in the case where p is quite large or pro- 
portional to the sample size. Naturally, one may ask how to test the independence 
among the p components of the population. From the principal component analysis 
point of view, the independence test statistic is usually the maximum eigenvalue of 
the sample covariance matrices. Under the additional normality assumption, John- 
stone [12] derived the asymptotic distribution of the largest eigenvalue of the sample 
covariance matrices to study the test Hq : = I assuming /i = 0. 

However, sample covariance matrices are not scale- invariant. So if ;U = 0, John- 
stone |12j proposes to perform principal component analysis (PCA) by the maximum 
eigenvalue of the matrix W = YY^ , where 

/ 

(1-1) y = (y. 



ij Jp,n •- 



Xll Xl2 Xl„ 



V 
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Here Xj = {xn,--- ,Xin)^ contains n observations for the i-th component of the 
population, i = 1, • • • ,p, and || • || represents the vector norm. 

Performing PCA on W amounts to PCA on the sample correlations of the original 
data if /i = 0. So for simplicity, we call W the sample correlation matrix in this 
paper. From now on, the eigenvalues of W will be denoted by 

< Ai < A2 < • • • < Ap. 

Then the empirical distribution (ESD) of W is defined by 

1 ^ 

The asymptotic property of Fp was studied in [11] and [2j. For the almost sure 
convergence of Ai and Ap, see [TT] . 

In this paper, we will study the fluctuations of the extreme eigenvalues Ai,Ap of 
W for a general population, including multivariate normal one. The basic assump- 
tion on the distribution of our population throughout the paper is 

Condition Ci. We assume Xij are independent symmetric distributed random 
variables with variance 1. And for any i, we assume Xji, • • • , Xin to be i.i.d. Further- 
more, we request the distributions of the x'^^s have sub-exponential tails, i.e., there 
exist positive constants C, C such that for all 1 < i < 1 < j < n one has 

^{\xij\ > t^) < e~* 

for all t > C. And we also assume p/n ^ y as p,n = n{p) — )• 00, where < y < 1. 

Remark 1.1. We use C, Co, Ci, C2, C", 0(1) to denote some positive constants in- 
dependent of p, which may differ from line to line. And we use Ca to denote some 
positive constants depending on the parameter a. The notation || • \ \op, || • ||f rep- 
resent the operator norm and Frobenius norm of a matrix respectively. And || • || 
represents a Euclidean norm of a vector. 

Remark 1.2. The sample correlation matrix W is invariant under the scaling on the 
elements Xij, so the assumption Var{xij) = 1 is not necessary indeed. We specify 
it to be 1 here just for convenience. Owing to the exponential tails, we can always 
truncate the variables so that \xij\ < K with some K > log*^^^^ n. 

A special sample correlation matrix model is the Bernoulli takes 
value of 1 or —1 with equal probability. Notice that if Bernoulli, we always 

have for all 1 < i < p 

II 1 12 — 2 I _i_ 2 _ 

As a consequence, the sample correlation matrix with Bernoulli elements coincides 
with its corresponding sample covariance matrix for which the limiting distribution 
of the extreme eigenvalues are well known under some moment assumptions. One 



can refer to [3].[lUj. [13| . [15] and [2Uj . We only summarize their results for the 
special Bernoulli case as the following theorem. 

Theorem 1.1. (Bernoulli case) For the matrix W in M.l]) . if ±1 Bernoulli 

variables, we have 



TWu 



TWi. 



(nV2 + pl/2)(p-l/2 + ^-1/2)1/3 

and 

(^1/2 _pl/2)(p-l/2 _ ^-1/2)1/3 

OS n OO with p/n ^ y £ (0,1). 

Here TWi is the famous Tracy- Widom distribution of type 1, which was firstly 
raised by Tracy and Widom in [19] for the Gaussian orthogonal ensemble. The 
distribution function Fi{t) of TWi admits the representation 



1 f°° 

Flit) = exp(-- [q{x) + (x - t)qixf]dx), 



where q statisfies the Painleve // equation 

q" = tq + 2q^, g'(t) ~ Ai(t), as t oo. 

Here Ai(t) is the Airy function. 

The main purpose of this paper is to generalize Theorem 11.11 to the population 
satisfying the basic condition Ci. Our main results are the following two theorems. 

Theorem 1.2. Let W be a sample correlation matrix satisfying the basic condition 
Ci. We have 



and 



as p ^ oo. 



- (pV2 ^ ^1/2)2 
(^1/2 +pl/2)(p-l/2 +^-1/2)1/3 

^Ai - (pV2 _ ^1/2)2 
(ril/2 _ pl/2)(p-l/2 _ ^-1/2)1/3 



TWi. 



TWi. 



Remark 1.3. For technical reasons, it is convenient to work with the continuous 
random variables Xij. As a result, the events such as eigenvalue collision will only 
occur with probability zero (see Lemma [53]) . Because none of our bounds depends 
on how continuous the Xij are, one can recover the discrete case from the continuous 
one by a standard limiting argument by using Weyl's inequality (see Lemma 
especially for the Bernoulli case. 
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If the population is normal, then we can derive the Tracy- Widom law for both 
the largest and smallest eigenvalues of the matrix TZ = RR^, where 



f ||xi-ii| 



(1.2) R — {rij)p^n '■- 



Xll-Xl X12-X1 _ _ _ Xln-Xl 

|xi-Si|| l|xi-2-i|| 



V 



Xp~i Xp Xp2 Xp 

IIXd-SdII IIXd-ZdII 



Here Xi = n Y^^=i ^ij ~ means each element Xij of Xj will be subtracted 

by Xj, i = 1, • • • ,p. We denote the ordered eigenvalues of 7^ by < Ai(7^) < • • • < 
\p{TZ) below. Actually TZ is the sample correlation matrix when the population 
mean is unknown. 

Theorem 1.3. For the sample correlation matrix TZ with i.i.d A^(0, 1) elements, if 
p/n — )• y G (0, 1), we have 

nAp(7^) - + ni/2)2 ^ 



and 



as p ^ 00. 



(nV2 + pl/2)(p-l/2 + ^-1/2)1/3 ' 
(^1/2 _pl/2)(p-l/2 _ ^-1/2)1/3 



TWi. 



TWi 



Throughout the paper, we will use the following ad hoc definitions on the frequent 
events provided in jl6j. 

Definition 1 (Frequent events). [IB] Let E be an event depending on n. 

• E holds asymptotically almost surely if T{E) = 1 — o(l). 

• E holds with high probability if ¥(E) > 1 — 0{n~^) for some constant c > (inde- 
pendent of n). 

• E holds with overwhelming probability if F{E) > Oc{n~'~") for every constant 
C > (or equivalently, that W{E) > 1 — exp(— w log n)). 

• E holds almost surely if ¥'{E) = 1. 

The main strategy is to prove a so-called "Green function comparison theorem" , 
which was raised by Erdos, Yau and Yin in ^ for generalized Wigner matrices. 
We will provide a "Green function comparison theorem" to the sample correlation 
matrices obeying the assumption Ci in Section 4, see Theorem 14.31 Then by the 
comparison theorem, we can compare the general distributed case with the Bernoulli 
case to get Theorem II. 2 i And as an application, we can also get Theorem 11.31 

Our article is organized as follows. In Section 2, we state some basic tools, which 
can be also found in the series work [IB], [IZ],[IS] and [2D]. And we provide some 
main technical lemmas and theorems in Section 3. The most important one is the 
so-called delocalization property of singular vectors, which will be shown as an ob- 
stacle to establish the Green function comparison theorem in the sample correlation 
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matrices case. And in Section 4, we provide a Green function comparison theorem 
to prove the edge universahty for sample correlation matrices satisfying the assump- 
tion Ci. In Section 5, we state the proofs for our main results: Theorem 11.21 and 
Theorem 11.31 



2. Basic Tools 

In this section, we state some basic tools from linear algebra and probability 
theory. Firstly, we denote the ordered singular values of Y by 

< (Ji < (72 < • • • < (Tp, 

1/2 

then we have ai = X- . If we further denote the unit right singular vector of Y 
corresponding cjj by Ui and the left one by Vi, we have 

(2.1) Yui = GiVi 
and 

(2.2) Y'^Vi = GiUi. 

Below we shall state some tools for eigenvalues, singular values and singular vec- 
tors without proof. 

Lemma 2.1. (Cauchy's interlacing law). Let I < p <n 

(i) If An is an nx n Hermitian matrix, and An-i isann — lxn — 1 minor, then 
Xi{An) < Xi{An-i) < Xi+i{An) for aUl<i<n. 

(ii) // An^p is a p X n matrix, and A^^p-i is a p — 1 x n minor, then cri{An^p) < 
ai(An,p-i) < ai+i{An,p) for alll<i <p. 

(iii) If p < n, An^p is a p x n matrix, and An~i^p is a p x n — 1 minor, then 
(Tj_i(j4„^p) < (Tj(j4„_i^p) < ai{An^p) for all 1 < i < p, with the understanding that 
(Jo(^n,p) = 0. (For p = n, one can consider its transpose and use (ii) instead.) 

Lemma 2.2. (Weyl's inequality) Let 1 < p < n 

• If M,N are nx n Hermitian matrices, then ||Aj(M) — Ai(A^)|| < \\M — N\\op 
for all 1 < i < n. 

• IfM, N arepxn matrices, then ||cjj(M)— cjj(A^)|| < \\M — A^||op for alll < i < p. 

The following lemma is on the components of a singular vector, which can be 
found in [E]. 

Lemma 2.3. [IS] Letp,n > 1, and let 

■A'p,n — (^p,n— 1 h^ 

be a p X n matrix with h £ C^, and let be a right unit singular vector of Ap^n 
with singular value cri{Ap^n), where x G C and u G C"'"^. Suppose that none of the 
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singular values of Ap^n-i is equal to ai{Ap^n)- Then 



,2 J- 



\X\ = 



1 I y^min(p,n-l) a,(Ap,„-i)^ J y ( A A . h\^' 



where {vi{Ap^n~i), • • • ) 'ymin{p.n-i)(^p,n-i) ^ C^} an orthonormal system of left 
singular vectors corresponding to the non-trivial singular values of Ap^^-i md 
Vj{Ap^n-i)'h = Vj{Ap^n-i)*h withvj{Ap^n^i)* being the complex conjugate ofvj{Ap^n^i). 

Similarly, if 

■^p—l,n 
I* 



A 



for some I G C", and {v'^,y)'^ is a left unit singular vector of Ap n with singular 
value cri(Ap^n), where y £ C and v G C*'"-'-, and none of the singular values o/v4p_i^„ 
are equal to cri{Ap^n), then 

|y|2 = I 

-I I Y^mln(p-l,n) (7j(Ap-i,n)2 , , . . ,,2' 

where {ui(ylp_i^„), • • • ,u„^ij^(p_i^n){^p-i,n) G C"} is an orthonormal system right 
singular vectors corresponding to the non-trivial singular values of Ap^i^n- 

Further, we need a frequently used tool in the Random Matrix Theory: the 
Stieltjes transform of ESD Fp{x), which is defined by 



for any z = E -\- irj with E e M. and 77 > 0. If we introduce the Green function 
G{z) = {W — z)~^, we also have 



1 1 " 



(2.3) sp{z) = -TrG{z) = -y^Gkk. 



Here we denote Gjk as the {j, k) entry of G{z). As is well known, the convergence of 
a tight probability measure sequence is equivalent to the convergence of its Stieltjes 
transform sequence towards the corresponding transform of the limiting measure. So 
corresponding to the convergence of Fp{x) towards FMP,y{x), the famous Marcenko- 
Pastur law FMP,y{x) whose density function is given by 

1 



(2-4) PMP,y = ^:^V (b -x){x- a)l[„^b](x), 

where a = (1 — ^/y)'^,b = (1 + ^/y)'^, Sp{z) almost surely converges to the Stieltjes 
transform s{z) of FMP,y{x). Here 
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where the square root is defined as the analytic extension of the positive square root 
of the positive numbers. Moreover, s{z) satisfies the equation 

(2.6) s{z) + — — = 0. 

y + z - 1 + yzs[z) 

If we denote the A:-th row of Y by and the remaining {p — 1) x n matrix after 
deleting y^ by Y^^\ one has 

1 yfF(i)^ 

y(l)y^ y(l)y(l)r 



w 

By Schur's complement, 
Gil -- 



1 



1-z- yf y(i)r(y(i)y{i)^ - z)-iy(i)yi 

^^•^^ " l-z- yf y(i)ry(i)(y(i)Ty(i) _ ^)-iy^ • 

The formula of Gkk is analogous. By ()2.3p . we have the following lemma on the 
decomposition of Sp{z): 

Lemma 2.4. For the matrix W , we have 

. . 1^ I 

p ^ 1 - Z - ylYik)TYik) (y (fc)Ty(fc) _ ^)-ly^ " 

The last main tool we need comes from the probability theory, which is a concen- 
tration inequality for projections of random vectors. The details of the proof can 
also be found in [16J. 

Lemma 2.5. Let X = (^i,--- ,^n)"^ £ C"' he a random vector whose entries are 
independent with mean zero, variance 1, and are hounded in magnitude by K almost 
surely for some K , where K > 10(E|^|^ + 1). Let H he a suhspace of dimension d 
and TiH the orthogonal projection onto H . Then 

P(||k//WII -^^1 >t) < 10exp( ^ 



In particular, one has 

\\7rH{X)\\ =Vd + 0{Klogn) 
with overwhelming prohahility. 

3. Main Technical Results 

In this section, we provide our main technical results: the local MP law for sample 
correlation matrices, and the delocalization property for the singular vectors. Both 
results will be proved under much weaker assumption than Ci. We form them into 
the following two theorems. 

Let us introduce more notation. For any interval / C M, we use Nj to denote the 
number of the eigenvalues of W falling into /, and use |/| to denote the length of /. 
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Theorem 3.1. (Local MP law). Assume that p/n — )• y with < y < 1. And 
{xij : 1 < i < p,l < j < n} is a collection of independent (but not necessary 
identically distributed) random variables with mean zero and variance 1. If\xij\ < K 
almost surely for some K = o{p^/^^5'^log~^p) with some Q < 5 < 1/2 and some large 
constant Cq for all i,j, one has with overwhelming probability that the number of 
eigenvalues Nj for any interval I C [a/2,26] with \I\ > ^ ^ obeys 



(3.1) 



\Ni -p / pMP^y{x)dx\ < 6p\I\. 



Remark 3.1. The topic of the limiting spectrum distribution on short scales was 
firstly raised by Erdos, Schlein and Yau in [6J for Wigner matrices. Such type of 
results are shown to be quite necessary for the proof of the famous universality 
conjectures in the Random Matrix Theory, for example, see [S] and [16j . 

Remark 3.2. A strong type of the local MP law has been established for more 
general matrix models in a very recent paper of Pillai and Yin, see Theorem 1.5, 
|14j . In fact, from Theorem 1.5 of [Tl], one can get a more precise bound than 
that in (j3.ip if we replace pMP,y{x) by the nonasymptotic MP law pw{x) defined 
in Section 4. Moreover, Pillai and Yin's strong local MP law also provides some 
crucial estimates on individual elements of the Green function G, which will be used 
to establish our Green function comparison theorem in Section 4. 

Theorem 3.2. (Delocalization of singular vectors) Under the assumptions of The- 
orem [3A\ and Exf^ = 0, if we assume x[jS are continuous random variables, then 
with overwhelming probability all the left and right unit singular vectors of W have 
all components uniformly of size at most p~^^'^ K'^°^'^ log'^^^^ p. 

Remark 3.3. Note that a little weaker delocalization property for the left singular 
vector Vi can also be found in Theorem 1.2 (iv) of Pillai and Yin 



Now if we denote 



X 



V 



£12 . . 




\/ri 


s/n 


Xp2 









then S := XX"^ is the sample covariance matrix corresponding to W . We further 
denote the ordered eigenvalues of 5" by < Ai < • • • < Ap and introduce the matrix 

/ \ 

D = 



\ 



j 



By Theorem 5.9 of |lj, we have Ap = 6 + o(l) holds with overwhelming probability. 
In fact, it is easy to see Ai = a + o(l) holds with overwhelming probability as well 



by a similar discussion through moment method. Observe that W = DSD, and 
ll-C — /||op = 0(1) holds with overwhelming probability. By Lemma [2.21 we also have 

(3.2) Ai = a + o(l), Ap = 6 + 0(1) 

holds with overwhelming probability. So below we always assume Aj G (a/2, 26), 1 < 
i < p. 

The proof of Theorem 13.11 is partially based on the lemmas of Section 2. It 
turns out to be quite similar to the case of sample covariance matrices and Wigner 
matrices, see [7], [8], [18] and [20]. However, the delocalization of the right singular 
vector Ui of Y is an obstacle, owing to the lack of independence between the columns 

of y. 

For the convenience of the reader, we provide a short proof of Theorem 13.11 at 
first. Our main task in this section is the proof of Theorem 13. 2[ more precisely, the 
right singular vector part of the theorem. 

Proof of Theorem \3.1[ We provide the following crude upper bound on Nj at first. 

Lemma 3.3. Under the assumptions of Theorem \3.1\ we have for any interval / C M 
with \I\ ^ K^log^ p/p, and large enough C > 

Ni < Cp\I\ 

with overwhelming probability. 

Proof. Firstly we introduce the notation 

y^ik) ^ y(fc)y(fc)T^ yy{fc) ^ y{fc)^y(fc)^ Q(k) ^ ^^(fc) _ ^yl^ g(k) ^ ^yy(fc) _ ^yl^ 

Let Xa^ ,a = l,---,p— 1 denote the eigenvalues of the {p—l)x{p — 1) matrix W^^^ . 
Thus Xa \ a = 1, • • • ,p — 1 are also the eigenvalues of the n x n matrix W'^^^ whose 
other eigenvalues are all zeros. We further use z/q, to denote the eigenvector of W*-^^ 
corresponding to the eigenvalue Xa^ , and introduce the quantity 



2 ^ I i2 

(3.3) = n\yi ■ Ua] =y — rj^ xi • i^^ =: — rTn6 



a ■ 



We can rewrite (|2.7|) as 
(3.4) Gil 



a=l ^(l) , 

An, —' 



By Cauchy's interlacing law, we also have Aq^ G [a/2,26] with overwhelming 
probability. Then for any z = E + irj such that E G [a/2, 26], we have 

(3.5) I^Gfcfcl < -TTT < 
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for any k £ {1, • • • ,p}. Now we set / = [E — r]/2,E + r]/2]. Notice that there always 
exists some positive constant C2 such that 

p 

(3.6) Ni < C2Pv'^Spiz) = C2V '^Gkk. 

k=l 

If we set C3 = C1C2, it fohows from ()3.5p and ()3.6p that 

P(iV/ > Cpr]) 
p 



Y '^'^kk > C^^Cp and Nj > CpT^) 

■ k=l ^ 

< pP^ Y CsC-^PV and iV/ > Cpi]^ 

< Pff'^^^ Yl ia < CsC-^pri and Ni > Cpr]^ 

a:\xi^''-E\<r,/2 

(3.7) < pP(||xi|p > 2n) +pp(^ Y L<2C3C-^PV a.nd Ni yCpT]^. 

a:\xi^^-E\<r,/2 

The first term of (|3.7p is obviously exponential small by the Hoeffding inequality. 
For the second term, we use Lemma 12.51 Now we specialize X in Lemma 12.51 to 
be xi and the subspace H to be the one generated by eigenvectors {ua : Xa^ € /}. 
Thus one has 

d = Ni> Cpr] > CK^ log2 n. 
Then by Lemma 12.51 we have 

Y L = \MX)\\'>lcpr] 



a:|Ai''-E|<r,/2 

with overwhelming probability. This implies that the second term of ()3.7p is expo- 
nential small when C is large enough. So we conclude the proof of Lemma 13.31 □ 

Now we proceed to prove Theorem 13.11 The basic strategy is to compare Sp{z) 
and s{z) with small imaginary part rj. In fact, we have the following proposition. 

Proposition 3.1. Let 1/10 and Li,L2,e,6 > 0. Suppose that one has the 

bound 

\sp{z) - s{z)\ < 5 

with (uniformly) overwhelming probability for all z = E + irj such that E G [Li,L2] 
and Sjz > rj. Then for any interval I C [Li + e, L2 — e] with \I\ > max(2?7, ^ log j), 
one has 

\Nj ~ y PMP,yix)dx\ < 6n\I\ 
with overwhelming probability. 
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Remark 3.4. Proposition 13.11 is an extension of Lemma 29 of [18j up to the edge, 
whose proof can be found in [2D|. In fact, the proof can be taken in the same manner 
as that of Lemma 64 in |16j for the Wigner matrix. 

So in view of Proposition 13. H to prove Theorem 13. H we only need to prove that 
the bound 

(3.8) \spiz) - s{z)\ < 6 

holds with (uniformly) overwhelming probability for all z = E + iij such that E G 

[a/2— e, 2f)+e] and 1/10 > rj > ^ ^^f " . To prove ()3.8p we need to derive a consistent 
equation for Sp{z), which is similar to the equation (|2.6p for s{z). 
Firstly by Lemma 12.41 we can rewrite Sp{z) as 

1 ^ 1 

= - Y] 3-' 

pf-^l-z-dk 

with 

Then the proof of ()3.8p can be taken in the same manner as the counterpart of 
the sample covariance matrix case (see the proof of formula (4.12) of [20j). We only 
state the different parts below and leave the details to the reader. We remark here 
that we consider the domain [Li,L2] = [a/2 — e,26 + e] rather than [a,b] in [20] . 
However, if one goes through the proof in |20] . it is not difficult to see that the proof 
towards any domain [Li,L2] containing [a,b] is the same. The only minor difference 
between our case and the sample covariance matrix in [SOj is the estimation of dk- 
We will only deal with di in the sequel. The others are analogous. By ()3.3p and 
p.4p . we have 

Aa C,a _ 1 Aa ^1 Aa {C,a ~ IJ 

-I Aa — Z a=l '^a ~ Z a=\ 

For the first term of (13.911 we have 



1 

(3-9) ^1 = ,(1) _ - ^Z^ >(i) _ ' ^Z^ ,(1) 



1 

-E 



= ^ + i V ^ •= ^(1 + zs^^^ (z)) 



where 



1 



is the Stieltjes transform of the ESD of VF*-^-*. Then by the Cauchy's interlacing 
property, we have 



I 



(z) - (1 - i).«(z)| = 0{- [ -^dx) = O(-) 
P P Jr\x - z\'^ prj 
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Consequently one has 

(3.10) = 

a=l '^a ~ Z 

Now we provide the following lemma on the second term of ()3.9p . 
Lemma 3.4. For all z = E + irj with E E [a/2 - e,2b + e] and t] > 

1 ^ Xa\^a — 1) , e2^ 

uniformly in z with overwhelming probability. 

Proof. We set Rj = (^^ - 1). By ([331) and the fact that 

n ^ , K'^ loff^ n , 

(3.11) = l + 0(- ^ ^ 



holds with overwhelming probability, we have for any Tc{l,-- - ,p— 1} 

(3.12) ^i?. = ^j^ixi.^^f -in. 

By using Lemma [221 we have 

(3.13) ^ |xi • = T + O (^VriTlog n V log^ , 

where a V 6 = max(a, 6). By inserting ()3.1ip and ()3.13p into ()3.12p . we have 

If we choose T = log*^^^^ n, we always have 

Y,R,=Y,\^,.v,\''-\T\+o{6^). 

Then the following part of the proof is the same as that in the sample covariance 
matrix case. One can refer to the proof of Proposition 4.6 of |20j for details. □ 

Now we proceed to the proof of Theorem 13. li By (|3.9p . (|3.10p and Lemma [3.41 we 
can get the following equation 

^'•''^ '^^'^ + ^^+z-l + z\s,iz) + oiP) = 

By a standard comparison of (|3.14p and (|2.6p (see [20j for example), we have (|3.8p . 
Thus by Proposition 13.11 we conclude the proof of Theorem 13.11 □ 
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Now we turn to the proof of Theorem 13. 2[ At first, we introduce the matrix 



Wi 



(n) • = 



^(n)>^H With 
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(n) 



Xll 


X12 


a;i,n-i 


ll^lll 


ll^lll 




X21 


2:22 


^2,n-l 


11^211 


11^211 


11^211 


Xpl 




Xp,n — 1 


llxpll 


llxpll ■ 


llxpll 



where 



— {Xjl: •^j2: ' ' ' 1 ^j,n— 1) • 

We will need the following lemma on eigenvalue collision. 

Lemma 3.5. If we assume the random variables x'^^s are continuous, we have the 

following events hold with probability one. 

i): W has simple eigenvalues, i.e. Xi < X2 < ■ ■ ■ < Xp. 

a): W and W^^^ have no eigenvalue in common. 

Hi): W and have no eigenvalue in common. 

The proof of Lemma 13.51 will be postponed to Appendix A. 

Proof of Theorem \3.SX The proof for the left singular vectors is nearly the same as 
the sample covariance matrix case shown in [20] by using Lemma 12.31 ii) of Lemma 
13.51 and Theorem 13. 1[ Moreover, as we have mentioned in the Remark 13.31 a slightly 
weaker delocalization property for the left singular vectors has been provided in |T^ . 
So we will only present the proof for the right singular vectors below. 

Below we denote the A;-th column of Y hy h^, and the remaining px{n — \) matrix 
after deleting /i^ by Y(^j.y Note that y(„) is not independent of the last column 
However, for the sample covariance matrix case, the independence between the col- 
umn and the corresponding submatrix is essential for one to use the concentration 
results such as Lemma 12.51 To overcome the inconvenience caused by the depen- 
dence, we will use the modified matrix Y(„) defined above. Notice that the matrix 

Yj-^) is independent of the random vector X2n • • • , Xpn)^ ■ 
Now we define 



^(n)^(n), 



T 



Y, 



T 



and 

(3.15) A2 = y(„) 

The following lemma handles the operator norms of Ai and A2. 
Lemma 3.6. Under the assumption of Theorem \3.Si we have 



ll'^lllopj IIA2II0P 

with overwhelming probability. 



0( 



n 
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Proof. Observe that 

Al = (^(n) - yil))y(n) + Y;^){Yin) - %)) = A2l(„) + y(^)Ai^. 

We only discuss the second term since the first one is analogous. It is easy to see 
the entries of satisfy 

,2 



^iivl 1^*11 \\^i\\ ) _ ^in "^V 



W^iW W^iW l|Xi||||Xj||(||Xi|| + ||Xi||) ||Xj||(||Xi|| + ||Xj||) \\Xi\ 

It follows that 

where A3 is a p x p diagonal matrix with (i, i)-th entry to be 

^2 



||Xj||(||Xi|| + ||Xj|| 

Thus it is easy to see 

\\As\\op = 0{—) 
n 



with overwhelming probability. Together with the fact that ||^(n)|| op < C holds with 
overwhelming probability, we can conclude the proof of Lemma 13.61 □ 

Now we proceed to the proof of Theorem 13.21 If we denote 



Ui 



where x is the last component of u,. Without loss of generality, we can only prove 
the theorem for x. Notice that Ui is the eigenvector of W = Y^'^Y corresponding to 
the eigenvalue Aj. From 



X \x 



Y{n) 

we have 

(3.16) y(^)y(„)w + xy(^)/i„ = A,w, 

and 

(3.17) /i^F(„)W + xh^K = \iX. 
()3.16p can be rewritten as 

+ Al)w + X(Y(^) + A2)hn = A^W. 

It follows that 

(3.18) (?(n)%) - = -^?(I)/ln - XA2K - AiW. 
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Note that y^-^^Y(„) share the same nonzero eigenvalues with so by in) of Lemma 
13.51 we can always view that the matrix Y|^^y(„) — Aj is invertible. Consequently, 

||w||2 = [xYl^^hn + xA2K + Aiwf{Y^^Y(^^)-Xi)-^[xY^^^hn + xA2K + Aiw] 

If X = then Theorem 13.21 is evidently true. Consider x ^ below. Together with 
the fact that = 1 — ||w|p, we have 

^2^ 1 

1 + [Y(^)/i„ + A2K + x-iAiw]^(y(^)y(„) - A,)-2[y(^)/i„ + A2/i„ + x-^Aiw] ■ 

Now if we use Xj to denote the ordered nonzero eigenvalue of Y|^jY(„) and Uj the 
corresponding unit eigenvector. And set the projection 

P = I- j^^.nj. 

Then by the spectral decomposition one has 
(3.19) x2 ^ 



1 + E%i 0^\^J ■ iYil)hn + A2K + x-iAiw)|2 + A 
where 

A = h\P{Yl.hn + A2hn + X-^A^^)\\\ 
\ 

Therefore to show |x| < n'^/^X'^''/^ log*^*-^-* n, we only need to prove 
P I 

(3-20) Yl 7\ ^^""^ ■ (^5)^" + + x-iAiw)|2 > nK-^' log-O(i) n. 

To prove p.20p . we need to separate the issue into the bulk case and the edge case. 
Before that, we shall provide the following lemma which will be used in both cases. 

Lemma 3.7. // we denote the unit eigenvector of l^(n) corresponding to Xj by 
Vj, under the assumption of Theorem \3.2\ we have for any J C {!,••• ,p} with 
\J\ =d<nK-'^, 

\^E(*j • /in)']'/' = Vd + 0{Klogn) 

with overwhelming probability. 

We will postpone the proof of Lemma [3. 71 to Appendix B. In fact, it can be viewed 
as a modification of Lemma 12.51 

Now we decompose the proof of Theorem 13.21 into two parts: bulk case and edge 
case. 
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• Bulk case: Aj € [a + e, 6 — e] for some e > 

Note that the local MP law (Theorem I3.ip can also be applied to the matrix 
(«) (")■ Thus we can find a set J C {1, • • • ,p} with \J\ > K^log^^n such that 

Xj = Xi + 0{K^ log^^ n/n) for any j ^ J when Aj is in the bulk region of the MP 
law. It follows that 

2 

(3.21) > C ,0 E I^J- • (^H^- + + 

log n ^ ' 

By the singular value decomposition, we have 

(3.22) Uj ■ Y^^hn = Xf^Vj ■ hn. 

Now we compare 

(3.23) Yl ■ ^(n)^"!' = E ^il^i • ^-1' 
with 

(3.24) ^|nj-(A2/i„ + x-^Aiw)|2 

for any J C {1, • • • ,p} such that log^° n < \J\ < nK~^. 

we get the conclusion for the bulk case. So we 

assume |x| > n" 

-i/2^Co/2 logO(i) ^ ^^i^^ ggi- ([220]). By LemmaESl if we choose 
Co ^ 20 (say), we have 

mM < 2|J|(||A2||op||/ln||)'+2x-Vl(l|Al||op||w||)2 

(3.25) < \J\n-^K'^''/^log~^^^K 

with overwhelming probability. On the other side. Lemma 13.71 implies 

(3.26) = Cn-Wj\+0{K^log^n)) 
with overwhelming probability. So one has 

(3.27) Yl I^J- • ^(n)^r^|^ > E l^^' ' ^^^^^ + ^"^Aiw)|2, 

where ^ means "much larger than", i.e. 

( Y ■ (^2hn + a;-iAiw)|2)/( J] In, . y(^)/i„|2) = 0(1). 

Notice that for any real number sequence • • • , 5m} and {Ti, • • • , T^} with 
YllLi^l ^ YliLi^^, there exists some c near 1 such that J^iLii'^i + ^«)^ ^ 
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m q2 



cYT=i St- Therefore by <^Ml,^^ and (HmH we can obtain 

which imphes (|3.2U|) directly. So we conclude the proof for the bulk case. 

Next, we turn to the edge case. 

• Edge case: a — o(l) <Ai<a + eor6 — e<Ai<fe + o(l) with some e > 0. 

For the edge case we also begin with the representation (j3.19p . By ()3.18p . we have 
(3.28) w = -x(Y(^)y(„) - Ai)-i(Y(^)/i„ + Aa/i™ + x'^Aiw). 

Inserting ([3:28]) and (I3J5I1 into (fXTTD we find 

(Yii^K + /^2hnV {Y^n)%) - >'i)~^{Y{n)hn + As/in + X^^Aiw) = - A^. 

Furthermore, 



|x"2w^Afw| < |x|-2||Ai||op||w|p = IIAillop^-^ 



Thus one has 

{Y^^)K + Aa/in + x-iAiw)^(Y(^)y(„) - {Y^^-^hn + As/i™ + x-^Aiw) 

X2 

Similarly to the bulk case, we only need to get ()3.20p . Below we also assume 
|x| > Cn-i/2|;^*^o/2iQgO{i)^ (IMjl . Similar to ^TWif . by using Lemma ESI 

we have 

(3.30) \uj ■ (Aa/in + x-^Aiw)|2 < n-^K'^'''/'^ log"^(^) n. 
Moreover, by Lemma 13.71 and (j3.22p . we also have 

(3.31) \uj ■ Y^^Kl"^ = Xi\vj ■ K\ < Cn-^'K'^ log^ n 

holds with overwhelming probability. Thus to provide p.20p . it suffices to show 

7\ T^l^^- • (^W^" + + x-iAiw)|4 > K-^'"+^ log-«(i) n 

j=i (Aj - Ai) 

instead. By the Cauchy-Schwarz inequality, we only need to prove 
(3-32) Yl |u, •(l'(I)/in + A2/i„ + x-iAiw)|2>log-0«n 

i'T-<j<i+T+ '^^l 

with overwhelming probability for some 1 < r_ < r+ < log'^*^^-' n. 



(3.29) =hlK-Xi + 0[\\Ai 



lop ^2 



18 ZHIGANG BAG, GUANGMING PAN, AND WANG ZHOU 

Notice that under the assumption \x\ > Cn^^/'^K'^"/'^ log*^^^^ n, by Lemma [3. 61 we 
have 

1 — 

IIAillop— ^ = o(l). 

Moreover, it is not difficult to see /i^/in = y + o(l) with overwhelming probability. 
Thus by p.29p . we have with overwhelming probability 

^ 1 ^ 1 ^ ^ 

7\ TtI^ • (5^(L/in + A2/i„ + x-iAiw)|2 - -||P(y(^)/i„ + A2/i„ + x-iAiw)||2 

I- 

= hlK - Ai + 0(||Ai||op^^) = y-\i + o(l). 



Observing that 



and 

||P(A2/in + x-iAiw)||2 <n-\ 

we also have 

P 1 

(3.33) 7\ ■ ^^S)''" + + x-^Aiw)|2 = y-Xi + o(l). 

j=i (Aj - Aj) 

So to prove (|3.32p we only need to evaluate 

(3.34) J2 r\ -X J ^^' ■ ^^W^" + + x"^Aiw)|2. 

j<i-T. or j>i+T+ V^j Aj) 

To do this, we let A > 100 be a constant large enough. For any interval I of 



length |/| = K'^log^ n/n, we set dj := where 



dist ( Aj , /) = min | Aj — x\ sgn( Aj , /) . 

X&I 

Here sgn(Aj,/) = l{resp. —1) when Aj is on the left {resp. right) hand side of /. 

By Theorem 13. H the interval I with \dj\ < logn contains at most log*^^^^ n 
eigenvalues. So we can set T_ , T-f accordingly so that such intervals don't contain 
any Xj if j < i — r_ or j > i + . In the following we only consider I such that 
> logn in the estimation of (|3.34p . Note that for A^ G /, 

1 _ _l_ 1 

X,-X,~ dj\I\^^^dj\I\'- 

Using (|3.30p and (|3.3ip again one has 

2\{uj ■ y(^,)hn)(Uj ■ (A2/l„ + X~^/^i\v))\ + \uj ■ A2/l„ + X'^'Uj ■ AlWp 

< Cn-ii^-0« log-«W n 
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when l^l > n~^^'^K^^''/'^\o^^^^ n. Thus we can find 

> \ {2\{uj ■ Yl!^)K){uj ■ {A2K + x-^Aiw))| + \uj ■ Aa/in + x~^Uj ■ Aiwj^) 

(3.35) < log-^d) n < log-«(i) n. 

\di\\I\n \di\ 

Here we used Lemma 13.31 in the last inequality. Now we partition the real line into 
intervals / of length log^ n/n, and sum (j3.35p over all intervals / with > log n. 
Then 

^ di 

So we can evaluate 

(3.36) 7r^'^^-^w^"''= ^ 71^'^^- 

j<i-T. or j>i+T+ V^i j<i~T^ or j>i+T+ ^-^i '^^^ 

instead of ()3.34p . The evaluation of ()3.36p is really the same as the counterpart in 
the sample covariance matrix case (see (4.5) in |20j ) by inserting Lemma l3.7t so we 
omit the details here. In fact, we can finally get 

Y 7\ — ' — rl^i"'*"!^ = p.v. / y _ , pMP,y{x)dx + 0(1) 

j<i-T- or j>i+T+ - ^i) ^ 

. . PMP,y{x) , . 

= y + Xip.v. '-^ — dx + o{l) 

J a X- Xi 

where p.v. means the principal value. 

Using the formula for the Stieltjes transform s{z), one can get from residue cal- 
culus that for Xi E [a, b], 

PMP,y{x) l-y-Xi 
p.v. I — dx — 



X- Xi 2yXi 



and for Xi [a, b] 



PMP,yix) , l-y-X, + -l-y)^-iy 

p.v. / — dx = . 

J a X- Xi 2yXi 

Consequently by the definition of a and 6, if \Xi — a\ < o(l), we have 

([333]) = -i + 2Vy + o(i), = Vv + oi^)- 

And if \Xi — b\ < o(l), we have 

(l333]) = -i-2Vy + o(i), mB = -Vy + o{i). 

Then it is easy to see when < y < 1, (|3.32p holds with overwhelming probability 
for the case where \Xi — a\ = o(l) or |Ai — b\ = o(l). Moreover by continuity we can 
adjust the value of e to get the conclusion for the general case a — o(l) < Aj < o + e 
or 6 — e < Aj < 6 + 0(1). Thus we complete the proof of the delocalization for Ui. □ 
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4. Green function comparison theorem 

In this section, we provide a Green function comparison theorem for the sample 
correlation matrices satisfying Ci. The proof heavily relies on the recent results of 
Pillai and Yin [Tl] on sample covariance matrices and the delocalization property for 
the right singular vectors proved in the last section. At first, we will borrow some 
results from |14) directly with only minor notation change. In fact, by Theorem 1.5 
in |14) . it is not difficult to see Theorem 1.2 and Theorem 1.3 of [13] also hold for 
sample correlation matrices under our basic condition Ci. 

To state the results in [T3], we need to introduce some notation. Define the 
parameter 

^ := (logp)'°sl°gP, 

and 

A± := (1±(^W. 

n 

Moreover we introduce the "nonasymptotic Marchenko-Pastur law " 



^^^^^ " ~ ^^^^ ~ A-)l[A_,A+](2;) 

and the corresponding distribution function Fv/{x) and Stieltjes transform 

sw{z) = I '-^dx. 

For C ^ 0, define the set 

(4.1) 3(0 := {z £C:0 < E < 5A+, < V < 10(1 + -)}. 

n 

And we say that an event 0, holds with ("-high probability if there exists a constant 
C > such that 

(4.2) P(J]") <p^exp(-99^) 

for large enough p. Note that ()4.2p implies that the event holds with overwhelming 
probability if C > 0. We further denote 

Ad := maxlGkk - swiz)\, Ao := max |Gfcz|, A := \sp{z) - sw{z)\. 

k k^l 

Lemma 4.1. (Theorem 1.5, [T4j ) Under the condition Ci, for any C > there exists 
a constant Cc_ such that the following events hold with C-high probability. 
( i ) The Stieltjes transform of the ESD of W satisfies 



(a) The individual matrix elements of the Green function satisfy 




A.(.)+A,(.)<^^cM^^^(^) 



pr] 
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(in) Uniformly in E ^M, 

\Fp{E)-Fw{E)\<^^Cp-\ 

We also need the following lemma on sw{z). 

Lemma 4.2. (Lemma 26, [14J) ^et k := min(|A+ — -B], IE' — A_|). Forz = E + ir]e 
S_{0), (see Iji4.1\ )) we have the following relations: 

(4.3) \sw{z)\^l, |l-s^(2)| ~ V^T^, 



(4.4) "^swiz) ~ < 



7^ if K>r] and \E\ ^ [X_,X^ 



y/K + rj if K < rj and \E\ E [X-, X+] 
where A ^ B means C~^B < A < CB for some constant C. Furthermore 

> Oik and d,^^ < 0. 
pr] p T] 

Now we set 1"^ = (y^^) := (^^J^/Hxf ||)p,n, with elements xjj satisfying our basic 
condition Ci. Correspondingly we let = V^Y^^ , G^{z) = [W^ — z)~^ and 
Sp(z) = ^TrG^ {z). Define the matrix ^ the Green function ^^(z) and the 
Stieltjes transform s^{z) analogously for another random sequence {x^} satisfying 
Ci which is independent of {xjj}- The aim in this section is to prove the following 
Green function comparison theorem. 

Below we only state the results and proofs for the largest eigenvalue. The smallest 
one is just analogous. 

Theorem 4.3. (Green function comparison theorem on the edge). Let F : M — t- M 
he a function whose derivatives F^"^ satisfy 

max|F(")(x)|(|x| + l)'^^ < Ci, a = 1,2,3,4 

X 

with some constant Ci > 0. Then there exists eo > depending only on Ci such 
that for any e < eo and for any real numbers E, Ei and E2 satisfying 

\E-X+\< \E^ - A+l < \E2 - A+l < 

and rj = p^'^l'^^'^ , we have 

(4.5) |E^F(pry9s^(z)) - E"^ F{pr]'^sJ {z))\ < Cp-^/^+^', z = E + ir], 



and 

j-E2 I-E2 

(4.6) E^'Fip dxQs^ix + iri)) -E'^Fip . 

Jei Jei 

for some constant C and large enough p. 



< Cp 
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Proof of Theorem \4-.S[ The proof is similar to that of Theorem 6.3 of [14J. Moreover, 
the proof of (|4.6|) can be taken in a same manner as that of (|4.5|) . so we will just 
present the proof for (|4.5|) below. The basic strategy is to estimate the successive 
difference of matrices which differ by a row. For 1 < 7 < p, we denote by the 
random matrix whose j-th row is the same as that of Y^ if J < 7 and that of Y"" 
otherwise; in particular Iq = and Yp = Y^ . And we set 

= Y^Y^. 

We shall compare W^-i with by using the following lemma. For simplicity, we 
denote 

= -TrG^'\z), 4*)(z) = 4*)(z) - — . 
p ' ' pz 

Lemma 4.4. For any sample correlation matrix W with elements satisfying the 
basic assumption C\, if \E — A+| < p-2/3+e ^-2/3 ^ ^ ^ p-2/3-e j^j, some 

e > 0, then we have 

where the functional A{Y^^\ mi, m2) only depends on the distribution of Y^"^^ and 
the first two moments mi , m2 of Xij . 

Remark 4.1. We always assume mi = 0, m2 = 1 in our case. 
Note that 

thus Lemma 14.41 implies that 

EF {r]QTr{W^_i - z)"^) - EF {r]QTr{W^ - z)'^) = p-'^^^^^'. 

Then the proof of Theorem 14.31 can be completed by the telescoping argument. 

Therefore it suffices to prove Lemma 14.41 in the sequel. To do this, we need to 
provide some bounds about Q^'^\ We only state the result for i = 1 as the following 
lemma since the others are analogous. 

Lemma 4.5. Under the assumptions in Lemma \4-4\ we have for e > small enough, 

(4.7) |yf(a«)Vl<//='+^^ 

and 

(4.8) |(a(^)).,|<P^% |((a«)^).,|<P^/^+^^ 
hold with overwhelming probability. 

The proof of Lemma 14.51 will be postponed to the end of this section. Now we 
begin to prove Lemma 14.41 assuming Lemma 14.51 
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Proof of Lemma \4-4\ The proof is in a similar manner to that of Lemma 6.5 in 

1 



At first we rewrite (j2.7|) as 

(4.9) Gn = ^ 

by using the facts that 

n;(i)g«(z) = / + zg«(z), yfyi = l. 
Moreover, by Schur's complement, we also have 

yfy(i)^(G(i))2y(i)yi 



(4.10) 



TrG - TrG^^^ = Gn + 



-z-zyJgW{z)y, ' 
Inserting (|4.9p and the identity 

y(l)T(^(l))2y(l) ^ y^(l)(g(l))2 ^ g{l) + 2(g(l))2 

into (|4.1U|) we can get 

(4.11) TrG - TrG^^^i + z'^ = zGn (yf (gW)2(z)yi). 

Now we define the quantity B as 



B = —zsw{z 
Thus by ()4.9p we have 
B = —zsw{z) 



yra(^)(.)yi 



1 



1 



zsw{z) 



1 



sw{z) - Gil 



^zGii(z) \zsw{z) 
By (u) of Lemma l4.ll and ()4.4p we can get 

< p-l/3+2. ^< ^ 

with overwhelming probability. Thus we have the expansion 



G 



11 



(4.12) 

Now we set 



Gii = '-^ = sw{z)Y,{-B)\ 



k>0 



y:=ri{TrG -TrG^^'^ + z-^] 
It follows from (jilT]) and ([^2]) that 



2/ = r/.Gnyf(a«)Vi = E 



fc=i 



where 



yfc :=r?zsw-(^)(-i?)'-Vr(e«)Vi- 
Since z and siy(-z) are 0(1) by (|4.3p . by definitions and Lemma HTSl we have 
(4.13) \yk\ < 0{p-^l^+^') and \y\ < 0{p~^/'^+^') 
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with overwhelming probabihty. Thus we have 

F{prj^sp{z)) - F{pr]<^s^j^\z)) 
^ 1 

k=l 

with overwhelming probability. 

Similarly to the counterpart proof of Lemma 6.5 in [H], we only need to show 

(4.14) EFW(pr/9sW(z))(%)'^ = mi, ma) + 0(p~^/^+^'), A; = 1,2,3 

with some functional only depending on the distribution of Y^^\ mi and m2. 

Since the proof of ()4.14p is similar to the counterpart in [T3j, we will only state 
the proof for = 3 below. We use Ei to denote the expectation with respect to yi 
in the sequel. By using ()4.13p we obtain 

(4.15) f(3)(pt/9sW(z))(%)3 = F(3)(pr?9sW(z))(%i)3 + 0{p-^/^+^') 

with overwhelming probability. If we write ri = ^{rjzswiz)),r2 = '^{rjzswiz)), then 
we have 

Ei(%i)3 = Eir?(9(yf(g«)V))' + ]Eiri(K(yf(g«)Vi))' 
+3Eir?r2(9(yf(aW)Vi))^(5R(yr(a«)V)) 
+3Eiriri(9(yf(g«)Vi))(5?(yf(a(^))V))^ 

fci ,k(i i=l i=l 

3 



Xlk, 

k2i-l,k2i 



E iEi(nTST)n^((^^'^)') 



, Xi 

ki,---,k(i i=l i=l 

+3ririEufl|St)flS((e''')=),,..,,.9((e'''f),,^^ 

i = l II ^11 j=l 

j=l II ^11 j=l 

Notice that if there exists a ki which appears only once in the above product, 
then by the assumption that Xij is symmetric, we have 

6 

(4.17) Ei(rr^) = = mi. 

So we consider the case where ki appears exactly twice. Firstly, we consider 

II ||6 2 2 2 2 2 2, 222 

A:i,fc2,fc3 (1) (2) 
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where the first summation goes through the indices ki,k2, such that they are not 
equal to each other, and the second summation goes through the left part of the 
indices. Then it is not difficult to see the number of the terms in the second sum- 
mation is of the order O(n^). By the exponential tail assumption and the Hoeffding 
inequality, we can see 

||xi||6 ^ n 

(2) II ^11 

Furthermore, since xn,-- - ,xin are i.i.d., we have for ki,k2,k3 not equal to each 
other 
(4.18) 



El 



-(l-O 



,logO(i) 



o{i) 



n 



-))=4+o(^ 



n 



)• 



||xi||6 n(n-l)(n-2) 

Therefore by (glS]), (gH]), (Ii38]l and the fact that ^(^^ only depends on y(^), we 
have 



< 



|Ei(%i)3-i3(y«,mi,m2)| 
(3) 

6 



|fcl,fc2 



(l)^2■ 



\i9 



(1)^2- 



(4.19) 



■^C\r]zsw{z)f El 

(4),(5) 



n 



with some functional A3 only depending on the distribution of Y^^\ mi and m2- 
Here the first summation X](3) (|4.19p goes through the terms such that each 
ki,i = I,-- - ,6 appears exactly twice. It is easy to see that there are O(n^) such 
terms totally. And the second summation goes through the terms such that (4) no 
ki appears only once and (5) at least one ki appears three times. Thus we have the 
total number of the terms in the second summation is of the order O(n^). Then by 
using Lemma 14.51 and the fact 



El 



n 

i=l 



Xlk, 



|Xl| 



o( 



wO(i) 



we have 

(4.20) Ei(%i)3 = i3(yW,ml,m2) + 0(p- 

By inserting ()4.20p into ()4.15p . we can get ()4.14p for k = 3. The cases of k 



'2+Ce\ 



1 



2 can be proved similarly by inserting Lemma 14.51 So we conclude the 

□ 



and k 
proof. 

Now we begin to prove Lemma 14.51 

Proof of Lemma \4-5\ The proof of (|4.7p is the same as the counterpart in [T^ , (see 
(6.36) of [14j). So we only state the proof of (|4.8p below. For the ease of the 
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presentation, we prove (j4.8p for G = (W — z)~^ := (Y^Y — z)~^ instead of G^^'^ ■ By 
the spectral decomposition, we have 

where the projection P = I — Ylk=i ^fe^^ • Consequently, we have 

Note that \Pij\ < 1, |2;| > A4-/2. By the delocalization property of in Theorem 
13.21 one has 

wO(i) P 1 

with overwhelming probability. For a = 2, by using i) of Lemma l4.ll and (|4.4p we 
have 

P 1 1 

which implies 



For a = 1, we have 



Observe that 



Pi r I 
En \=P / I idFpix). 



|P f r^dFpix)-p [ —^dFw{x)\ <Cp [ \M^L-J^^Mdx < r,-^p'^^ 
J \x — z\ J \x — z\ J \x — z\^ 

with overwhelming probability. Here we used {in) of Lemma [4. II in the last inequal- 
ity. Consequently, we have 

\Gij\ < (log^('^p) / —^dFw{x) + C. 
J \x z\ 

It remains to estimate J \x^z\ dFw{x). For E < such that — E < 



dFw{x) = / + / ^=====dFw{x). 



\x-z\ VA_ J2E-X+J yJ{x-EY +ri'^ 

By the formula for the MP law, one has 
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(4.21) 
and 



< C 



2E-X+ 



X 



-.dx = 0(1), 



--dFw{x) < rj 



-1 



A+ 



When S > A+, we still have ()4.2ip . Therefore, we have 

\Gii\ <P^" 

with overwhelming probability. Thus we complete the proof. 
Theorem 14.31 is proved. 



dFw{x) = 0(1). 



□ 
□ 



5. Proofs of main theorems 

In this section, we provide the proofs of Theorem 11.21 and Theorem 11.31 

Proof of Theorem M.SX The proof of Theorem ll.2l is totally based on Theorem 1.5 of 
|14j and our Theorem 14.31 Let and be two independent sample correlation 
matrix satisfying Ci. We claim that there is an e > and 5 > such that for any 
real number s (which may depend on p) one has 

P^(p2/3(Ap - A+) < s-p~')-p~^ < P^(p2/3(Ap 

,2/3 



A+) < s) 



< F^{p''%Xp - A+) < s+p-') +p- 



(5.1) 

for p > po sufficiently large, where po is independent of s. The proof of (|5.ip is 
independent of the matrix model and totally based on Theorem 1.5 of [13] and our 
Theorem 14.31 we refer to the proof of Theorem 1.7 of [13] for details. 

Now if we choose to be the Bernoulli case, it is not difficult to get Theorem 

□ 



1.21 bv combining ()5.ip and Theorem 11.1 
Proof of Theorem \1.3l Set the matrix 



A 



( ^ 



/3-2 
1 



1 

1 



1 


2 



/3-2 



\ ^n{n-\) 



1 









n-1 



It is easy to see A is an orthogonal matrix. Moreover, it is elementary that 



A{x 



il 



Xi , 



Xi 



{0,z, 



il, 



1 ^j,n— 1 J 



where zn, 



_i is a sequence of i.i.d A^(0, 1) variables. Further, if we denote 



the vector Zj = {zn, • • • , Zin-i) , we also have 



n— 1 
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Thus one has 



Here 



n = RrT = RA^AR^ =: Z. 



Z = ZZ^ 



with 



Z 



v 



|zi| 



gp2 



|zi| 



■2p,n — 1 



Consequently, in the Gaussian case, is also a 1^-type sample correlation matrix 
defined in (jl.ip with parameters n — 1. Thus by Theorem 11.21 we have 

(n-l)Ap(7l)-(y/2 + (^_ 1)1/2)2 



(5.2) 
and 
(5.3) 



((n- 1)1/2 +pl/2)(p-l/2 + (j, 

(n- l)Al(7^) - (pi/2 _ 



_ 1)^1/2)1/3 
1)1/2)2 



((n - 1)1/2 _ pl/2)(^-l/2 _ _ 1)^1/2)1/3 



as p — oo. Replacing n — 1 by n in ()5.2p and ()5.3p . we can complete the proof of 
Theorem fOl □ 



6. Appendix A 

In this appendix we prove Lemma 13.51 

Proof of Lemma \3. 51 At first we prove i). Note that W = DSD. For W and SD'^ 
share the same eigenvalues, it is equivalent to prove that the eigenvalues of SD^ are 
simple. We further introduce the polynomial -Pi (A') of {xij, 1 < i < 1 < j < n} 
as 

p 

Piix) = nii^fcii'- 

k=l 

It is easy to see Pi{X) vanishes with zero Lebesgue measure, so we can always 
assume -Pi (A) 7^ 0. As a consequence, we can reduce our problem to prove the 
matrix 

Q:=SD^Pi{X) 

has no multiple eigenvalue. Now we denote the discriminant of the characteristic 
polynomial of Q by Pq{X). Observe that all the entries of Q are polynomials of 
{xij, 1 < i < p, I < j < n} , so Pq{X) is also a polynomial of {xij, 1 < i < p,l < j < 
n}. For the set of zeros of any non null polynomial in real variables only has zero 
Lebesgue measure, it suffices to prove that Pq{X) is not a null polynomial. In other 
words, it suffices to find a family {xij, l<i<p,l<j<n} such that Pq{X) / 0. 
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It is equivalent to show that W has no multiple eigenvalue for one sample of the 
collection {xij, 1 < « < p, 1 < j < n} such that Pi{X) ^ 0. 
Now we choose the sample as 

1, j = i or i + 1 
0, others 



with 1 < i < p,l < j < n. Then it is not difficult to see 

1 

2 



/I h \ 

w = 



i 1 i 

2^2 



i 1 i 
2^2 

which is a Jacobi matrix with positive subdiagonal entries. Such a Jacobi matrix 
has simple eigenvalues, for example, see Proposition 2.40 of 

Next we turn to the proof oiii). We use X'-^^ to denote the submatrix of X with 
p-th row deleted, and use -D^^^ to denote the p — 1 x p — 1 upper left corner of D. 
And we set 5^^) = X^p^X^p^'^, thus one has W^^^ = D^p)S^p^D^p\ Similar to the 
proof of i), we can prove that SD^Pi{X) and S^p\D^p^)^Pi{X) have no eigenvalue 
in common instead. It is easy to see the resultant of the characteristic polynomials 
of SD'^PiiX) and S^p'>{D(p'>)^Pi{X) is a polynomial of {xij, I < i < p,l < j < n}. 
Therefore, it suffices to show the resultant is a non null polynomial. Equivalently, 
we shall provide a sample of {xij, 1 < i < p, 1 < j < n} such that W and W^p'^ have 
no eigenvalue in common. 

Using i) to W'-P^ we can denote the ordered eigenvalues of W^p"^ by A^^^^ < A2^^ < 
(p) 

■ ■ ■ < Xp_i- By Cauchy's interlacing property, one has 

(6.1) < Ai < aS^^ < As < • • • < aJ,"\ < Ap. 

Moreover, we know that W^p^ shares the same nonzero eigenvalues with W^p\ So 
we can provide an example such that W and W^p^ have no nonzero eigenvalue in 
common instead. Note 

(6.2) W = + ypyj. 
Taking trace on both side of (j6.2p . we obtain 

(6.3) Ai + • • • + Ap = aS"^ + • • • + aJ,"\ + 1. 

Now if we fix {xij, I < i < p - 1,1 < j < n} such that A^^^ < A^^^ < • • • < aJ^ i 
and let {xpj,l < j < n} vary. When {xpj,l < j < n} runs through the set M", 
the ordered nonzero eigenvalues of W describe the set of families Ai, • • • , Ap of real 
numbers obeying and (|6.3|) . see the proof of Lemma 11.4 of [3] for example. 
Thus it is easy to find a family Ai, • • • , Ap such that 

{Ai,---,Ap}n{AS^),A(^),A('^J = 
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Now we prove in). We set -'^(n) to be the submatrix of X with the n-th column 
deleted and set 



D 



(n) 



\ 



\ 



IIXpll / 



Let S(n) = -^{n)^Jn)- obvious that S(n)D'\n) shares the same eigenvalues with 
W(n) Now we introduce the polynomials 

V 

P2(X) = []||xfc||2.||Xfc||2. 

fc=l 

To prove that W and W^(n) have no eigenvalue in common, we only need to show SD^ 
and S(^n)^fn) '^s.ve no eigenvalue in common. Moreover, if P2{X) does not vanish, it 
is equivalent to prove that the matrices T := SD'^P2{X) and T(„) := 5(„)L>J„)P2(^) 
have no eigenvalue in common. Note that the event P2iX) = has zero Lebesgue 
measure. What's more, it is not difficult to see the entries of T and T(„) are all 
polynomials of the elements of X, thus the resultant R{X) of the characteristic 
polynomials of T and T(„) is also a polynomial of the elements of X. Therefore, we 
only need to show R{X) is a non null polynomial, it suffices to give only one example 
of X such that W and W^n) do not have eigenvalue in common. For example, we 
can choose 



I, j = i or j 
0, others 



n. 



Then we have = Ip and 



W 



( 1 

1 

2 



2 



1 / 



Thus it is easy to see W^(„) and W have no eigenvalue in common for det(W^ — /) ^ 0, 
which implies that R{X) is not a null polynomial, so we conclude the proof. □ 



7. Appendix B 
In this appendix, we prove Lemma 13.71 If we denote 
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Set 

1 

llxill • llxill • fllxill + ||x,;in ' 



By the Hoeffding inequality, we have 



1 ROW \o^o(i) 



n 



(7.1) Ci = + 
holds with overwhelming probability. It is not difficult to see 

(7.2) Vj ■ hn = Vj ■ K - Vj ■ (cixf„, • • • , CpXp„)'^ := Vj ■ K + dj. 
By (|7.ip . we can write dj 

(7.3) dj := ^2^2 " i-^in^ ' ' ' i ^pn) + fj- 
Observe that 

^{Vj ■ hnf = ^{Vj ■ hnf + 2'Y^dj{Vj ■hn)+Y^ d]. 

jeJ j£J jeJ jeJ 

Since (xf^,--- ,Xp„)^ is also a random vector with mean zero and finite variance 
entries, Lemma [2. 51 can be used to the first part of the right hand side of (|7.3p . Thus 
if we set the projection 

(7.4) Pj = J2v^vJ, 
then we have 



1 I T\ kOW 1no-0{i) 

E 4 < 4'^^ • • • • ' + ^ E /I = o(^) + 1? 

with overwhelming probability. Here we have used the fact that for any J 
with overwhelming probability. Since 

/ N 1/2 / N 1/2 

it suffices to prove the following lemma instead. 



Lemma 7.1. Using the notation in Lemma 3.7, we have for any J G {!,••• ,p} 

with \J\ = d < nK~^, 

V^[Y.(^J ■ ^n)^]^^^ = Vd + 0{Klogn) 
with overwhelming probability. 
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Proof. Observe that 



Now we set 



and 



Vj • — — Vj ' h^i. 



It follows that 



We use the following concentration theorem, which is a consequence of Talagrand's 
inequality, (see Theorem 69 of [E]). 

Theorem 7.2. (^Talagrand's inequality^. Let D be the disk {z G C, \z\ < K}. For 

every product probability fi on D^, every convex 1-Lipschitz function F : — )■ M, 
and every r >0, 

H{\F - M{F)\ >r)< 4exp(-rVl6iv:2), 
where M{F) denotes the median of F. 

Remark 7.1. In fact, here we only need the real case of the theorem. 

It is easy to see 

is a convex function of the vector Note 

iFjh'J - F{hn)\ ^ \F{h'^)-F{hn)\ 
\Wn-hn\\ W^h'^-^hnW 

where 

?/ _(jAn_ ... h' - (t' ... t' \^ 

~ V 1 1'-: 1 1 ; ' I Iv I r ' n ~ V-^lrn ; -^pn) • 

Since F{hn) is the norm of a projection of the vector ^/nhm it is always 1-Lipschitz 
with respect to y/nhn. And by the Hoeffding inequality, we also have 

1 1 y/nh'n - ^/nhr. 



\h' — h I 

\"'n 



< 2 



with overwhelming probability. So F{hn) is a 2-Lipschitz function with overwhelm- 
ing probability. Thus we can always consider F{hn) as a 2-Lipschitz function below. 
By Theorem 17.21 we have 

P(|F(^„) - M{F{hn))\ >r)< 4exp(-rV64i^:2). 
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So to conclude the proof of Lemma 17.11 we only need to show that 

\M{F{hn)) -Vd\< 2K. 

Note that 

/ Vn_ \ 

/ MO. II \ 



Vnhn 



hn '■= Dhr 



\ / 

So we have F'^(hn) = n Yjj<^j " ^"P = SjeJ h'^DvjvjDhn := fi^DPjDhn, where 
Pj is the projection defined in ()7.4p . Let DPjD =: {mki)i<k,i<p, then we have 

F'^(hn)= ^ mklXknXln = mkkxl„ + rnklXknXln- 
l<k,l<p k=l i<k=^l<P 

We fix all the variables except • • • , Xpn, so the probabilities and expectations are 
all taken with respect to hn below. Consider the event £^ that Fijin) > \fd + 2i^, 
which imphes F'^{hn) >d + 4:\^K + K'^. It follows that 

p 

< nikkxln >d + 2VdK) + P(| ^ nikiXknXinl > 2VdK). 

k=l i<k=^l<P 

Observe that 

rukkxlJ = Y,mkk = d{l + 

k=l k=l 

holds with overwhelming probability for any small e > 0. Here we have used the 
fact that 

XminiD)TrPj < TrbPjb < X^,,{D)TrPj 

and TrPj = d. By the condition that d < nK~^ , we have 

p 



mkkxl^} = d + o{VdK). 



k=l 

Let Si := Yfk=i ^kk{xlj^ - 1). We have 

nY^rukkxln >d + 2VdK) < F{\Si\ > VdK) < 

k=l 

And by the assumption on K we also have 

p p 
E|5i|2 = ^m2,E(xL - 1)' = Y,ml,(Ext - 1) < dK. 

k=l k=l 

Thus, 

n\Si\ > VdK) < < ^ < 1/10. 
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Set ^2 := I Y,k^i 'rrikiXknXinl- Then we have 



E5| = 2Y^mli< 2TrDPjD'^PjD < \\D\\%TrPj = 2d{l + 0{—^)). 

k^l 



By Chebyshev's inequaUty one has 

F{S2 > 2y/dK) < 1/10. 

Similarly, we can define f _ as the event F{hn) < Vd — 2K and use 

P(f_) < P(5i <d- VdK) + P(S2 > \fdK). 

Both terms on the right hand side can be bounded by 1/5 by the same argument as 
above. So we conclude the proof. □ 
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